Goto

Collaborating Authors

 halpern iteration






Faster Fixed-Point Methods for Multichain MDPs

Zurek, Matthew, Chen, Yudong

arXiv.org Machine Learning

We study value-iteration (VI) algorithms for solving general (a.k.a. multichain) Markov decision processes (MDPs) under the average-reward criterion, a fundamental but theoretically challenging setting. Beyond the difficulties inherent to all average-reward problems posed by the lack of contractivity and non-uniqueness of solutions to the Bellman operator, in the multichain setting an optimal policy must solve the navigation subproblem of steering towards the best connected component, in addition to optimizing long-run performance within each component. We develop algorithms which better solve this navigational subproblem in order to achieve faster convergence for multichain MDPs, obtaining improved rates of convergence and sharper measures of complexity relative to prior work. Many key components of our results are of potential independent interest, including novel connections between average-reward and discounted problems, optimal fixed-point methods for discounted VI which extend to general Banach spaces, new sublinear convergence rates for the discounted value error, and refined suboptimality decompositions for multichain MDPs. Overall our results yield faster convergence rates for discounted and average-reward problems and expand the theoretical foundations of VI approaches.


Towards Weaker Variance Assumptions for Stochastic Optimization

Alacaoglu, Ahmet, Malitsky, Yura, Wright, Stephen J.

arXiv.org Machine Learning

We revisit a classical assumption for analyzing stochastic gradient algorithms where the squared norm of the stochastic subgradient (or the variance for smooth problems) is allowed to grow as fast as the squared norm of the optimization variable. We contextualize this assumption in view of its inception in the 1960s, its seemingly independent appearance in the recent literature, its relationship to weakest-known variance assumptions for analyzing stochastic gradient algorithms, and its relevance in deterministic problems for non-Lipschitz nonsmooth convex optimization. We build on and extend a connection recently made between this assumption and the Halpern iteration. For convex nonsmooth, and potentially stochastic, optimization, we analyze horizon-free, anytime algorithms with last-iterate rates. For problems beyond simple constrained optimization, such as convex problems with functional constraints or regularized convex-concave min-max problems, we obtain rates for optimality measures that do not require boundedness of the feasible set.


Asymptotic regularity of a generalised stochastic Halpern scheme with applications

Pischke, Nicholas, Powell, Thomas

arXiv.org Artificial Intelligence

We provide abstract, general and highly uniform rates of asympto tic regularity for a generalized stochastic Halpern-style iteration, which incorpo rates a second mapping in the style of a Krasnoselskii-Mann iteration. This iteration is general in two ways: First, it incorporates stochasticity in a completely abstract way rather th an fixing a sampling method; secondly, it includes as special cases stochastic versions of variou s schemes from the optimization literature, including Halpern's iteration as well as a Krasnoselskii-Mann iteration with Tikhonov regularization terms in the sense of Bot, Csetnek and Me ier. For these particular cases, we in particular obtain linear rates of asymptotic regularity, matching (or improving) the currently best known rates for these iterations in stochastic opt imization, and quadratic rates of asymptotic regularity are obtained in the context of inner produ ct spaces for the general iteration. We utilize these rates to give bounds on the oracle complex ity of such iterations under suitable variance assumptions and batching strategies, aga in presented in an abstract style. Finally, we sketch how the schemes presented here can be ins tantiated in the context of reinforcement learning to yield novel methods for Q-learning. Keywords: Asymptotic regularity, Halpern iteration, Tikhonov regularization, Q-learning, proof mining MSC2020 Classification: 47J25, 47H09, 62L20, 03F10 1. Introduction Approximating fixed points of nonexpansive mappings is one of the mo st fundamental tasks in nonlinear analysis and optimization. The problem becomes particular ly interesting when we only have noisy versions those mappings, in which case the resulting a pproximation methods become stochastic processes. Concrete examples of this genera l situation include model-free reinforcement learning algorithms, where variants of Q-learning, for instance, can be viewed as stochastic methods for computing fixpoints of nonexpansive oper ators. To be more concrete, let p X, null null q be a seperable real-valued normed space and T,U: X Ñ X be two nonexpansive mappings on X, i.e. null Tx Ty null ď null x y null and null Ux Uy null ď null x y null for all x,y P X .


Stochastic Halpern iteration in normed spaces and applications to reinforcement learning

Bravo, Mario, Contreras, Juan Pablo

arXiv.org Machine Learning

We analyze the oracle complexity of the stochastic Halpern iteration with variance reduction, where we aim to approximate fixed-points of nonexpansive and contractive operators in a normed finite-dimensional space. We show that if the underlying stochastic oracle is with uniformly bounded variance, our method exhibits an overall oracle complexity of $\tilde{O}(\varepsilon^{-5})$, improving recent rates established for the stochastic Krasnoselskii-Mann iteration. Also, we establish a lower bound of $\Omega(\varepsilon^{-3})$, which applies to a wide range of algorithms, including all averaged iterations even with minibatching. Using a suitable modification of our approach, we derive a $O(\varepsilon^{-2}(1-\gamma)^{-3})$ complexity bound in the case in which the operator is a $\gamma$-contraction. As an application, we propose new synchronous algorithms for average reward and discounted reward Markov decision processes. In particular, for the average reward, our method improves on the best-known sample complexity.


An Inexact Halpern Iteration with Application to Distributionally Robust Optimization

Liang, Ling, Toh, Kim-Chuan, Zhu, Jia-Jie

arXiv.org Artificial Intelligence

The Halpern iteration for solving monotone inclusion problems has gained increasing interests in recent years due to its simple form and appealing convergence properties. In this paper, we investigate the inexact variants of the scheme in both deterministic and stochastic settings. We conduct extensive convergence analysis and show that by choosing the inexactness tolerances appropriately, the inexact schemes admit an $O(k^{-1})$ convergence rate in terms of the (expected) residue norm. Our results relax the state-of-the-art inexactness conditions employed in the literature while sharing the same competitive convergence properties. We then demonstrate how the proposed methods can be applied for solving two classes of data-driven Wasserstein distributionally robust optimization problems that admit convex-concave min-max optimization reformulations. We highlight its capability of performing inexact computations for distributionally robust learning with stochastic first-order methods.


Extending the Reach of First-Order Algorithms for Nonconvex Min-Max Problems with Cohypomonotonicity

Alacaoglu, Ahmet, Kim, Donghwan, Wright, Stephen J.

arXiv.org Artificial Intelligence

We focus on constrained, $L$-smooth, nonconvex-nonconcave min-max problems either satisfying $\rho$-cohypomonotonicity or admitting a solution to the $\rho$-weakly Minty Variational Inequality (MVI), where larger values of the parameter $\rho>0$ correspond to a greater degree of nonconvexity. These problem classes include examples in two player reinforcement learning, interaction dominant min-max problems, and certain synthetic test problems on which classical min-max algorithms fail. It has been conjectured that first-order methods can tolerate value of $\rho$ no larger than $\frac{1}{L}$, but existing results in the literature have stagnated at the tighter requirement $\rho < \frac{1}{2L}$. With a simple argument, we obtain optimal or best-known complexity guarantees with cohypomonotonicity or weak MVI conditions for $\rho < \frac{1}{L}$. The algorithms we analyze are inexact variants of Halpern and Krasnosel'ski\u{\i}-Mann (KM) iterations. We also provide algorithms and complexity guarantees in the stochastic case with the same range on $\rho$. Our main insight for the improvements in the convergence analyses is to harness the recently proposed "conic nonexpansiveness" property of operators. As byproducts, we provide a refined analysis for inexact Halpern iteration and propose a stochastic KM iteration with a multilevel Monte Carlo estimator.